Giga Games 1

home *** CD-ROM | disk | FTP | other *** search

/ Giga Games 1 / Giga Games.iso / net / go / comp / longroad < prev next >

Wrap

Text File | 1993-06-20 | 15.1 KB | 288 lines

THE LONG ROAD AHEAD Difficulties in Machine Intelligence by Peter Dudey (pdudey@willamette.edu, hagbard on the IGS) a Willamette University Undergraduate Research Grant 28 August 1992 ACKNOWLEDGEMENTS I would like to thank the University for funding me in this work, and Professor Jim Levenick for advising me. I was greatly aided by the insights of David Fotland and the other programmers at the North American Computer Go Championship, and the Go players on the Internet Go Server who tested an commented on Fumiko. All those involved in setting up the server and maintaining the Go directory at the University of Washington were also helpful. Finally, I would like to thank Ian Osgood for joining me in my continued work. GENERAL OUTLINE I have spent the past three months applying machine learning techniques to the Japanese game of Go. I have learned a great deal about Go, programming in general, Go programming in particular, and intelligence. My insights on these last two topics are delineated in "On Intelligence And Learning." I have produced a working Go program, "Fumiko", which can be found on the public Go directory at the University of Washington computer system (milton.u.washington.edu on the internet). The program is somewhat competent, and defeated David Stoutamire's "Goprogram" in an untimed game over the internet. Fumiko is rather slow, though, and lost all of her rounds at the North American Computer Go Championship, held here in August at the U.S. Go Congress, by running out of time. (I gave Fumiko a feminine name, and use such pronouns, because she would usually play black, and by Taoist symbolism black is feminine.) Fumiko was never able to learn; this turned out to be too massive a problem, for reasons that will be explained in the section entitled "Why Is This So Difficult?". Although Fumiko was only completed in early August, I have learned so much, in writing her and at the Congress, that in retrospect she seems primitive and clumsy. (By comparison with the other programs in the Championship, she was.) Although the period of this grant is ending, I am continuing my work: I am working with Ian Osgood, a programmer from Beaverton, on a new program, "HellGo". This is discussed in greater detail in "Future Work." ON INTELLIGENCE AND LEARNING In the machine learning class I took from Professor Levenick last semester, a great deal of time was spent on the question of what it means to be intelligent. I believe intelligence involves the ability to ask oneself and answer three questions: 1) What's going on? 2) What should I do? 3) How should I do this? The third question may be applied to itself recursively. If a person decides they want to get rich by selling widgets, they must ask themselves how they are going to sell them. If they decide to sell them by advertising in the back of Soldier of Fortune magazine, they must consider how to go about placing such an ad. For a program to learn, it must be able to answer an additional question: 4) What effect will this have? When this question is answered, a prediction is being made, and this allows for use of the scientific method: make a prediction, test it (by performing the action decided upon), correct the prediction to whatever extent it didn't hold, and repeat the process. Correcting the prediction is the essence of learning. If a person or program keeps making the same wrong prediction, they are certainly not learning. How to make such a correction, however, is a diabolically difficult problem. The primitive learning algorithms we implemented in the machine learning class had it fairly easy; after they chose an action (implicitly predicting that this was the right answer), we would tell them what they should have done. If they got it wrong, they would alter their action-choosing mechanism to be more likely to choose that answer if faced with the exact same problem. This learning would generalize somewhat, in that the programs would usually then respond in a similar way to similar problems. In Go, things are not so simple, for two reasons: we don't know the "right" answer, and the situation is so complex that it's difficult to tell exactly what went wrong. In a simple game like Tic-Tac-Toe, it may be said that a move is right or wrong: correct moves lead to victory (or at least a tie), incorrect moves to defeat. Most humans, and simple computer programs, can work out every possible sequence of moves up to the end of the game. If a move will lead to the player winning regardless of how the opponent responds, it may be said to be the correct move. A human cannot do this with a game like Chess. There are simple too many possibilites--there are roughly 10^450 possible games. One may just try to "look ahead" further than the opponent, and make moves which will lead to a good board position. This, however, depends on being able to accurately judge how good a board position is. If one is going to consider three moves that each player might make, for each of the next ten turns, one must perform 310 such judgements. The best Chess programs do work by brute force: looking ahead many more moves than the opponent and using special-purpose hardware to make simple judgements of how good a board position is (e.g., counting pieces). This approach does not work in Go. The number of possible games is vastly higher (10^750), there are more legal moves in a turn (several hundred, compared to a dozen or so in Chess), there are more moves in a game (around 300, compared to about 40), and judging a board position is much more difficult. (This last has to do with the more significant interplay between pieces in Go.) Not only can we not tell a Go program what the right answer is, deciding on a move is a sufficiently complicated process that it is difficult to say exactly what went wrong. By way of analogy, suppose a woman walks into the street and is hit by a truck. She survives, and when she awakes in the hospital her husband tells her, "I hope you learned your lesson!" What is the lesson to be learned? Why did she fail in her task of not being hit by a truck? Did she fail to look both ways before crossing the street? Did she look, but not see the truck because she wasn't wearing her glasses? Was her timing off in running ahead of or behind the truck? Did she fail to put on high-visibility clothes that morning? Was it not her fault at all, but that of the truck driver? Was it some combination of these? Is her husband correct in telling her to change her behavior, or does he think the accident was divine retribution for some otherwise unrelated action? If something bad happens in a Go game (a group is captured or the game is lost), there is often not one particular move to blame. Also, when a bad move is made, the consequences may not be immediately apparent, as with the woman choosing not to wear her Day-Glo vest that day and then being flattened several hours later. Even if a bad move can be spotted, it is not always clear why it was bad. It may have been a bad way to attempt to save a group, or it may have saved the group but nonetheless been a bad move because it ignored some other, more urgent situation elsewhere on the board. Finding the weak link(s) in a chain of reasoning is difficult even for humans. Finally, even if we can find out what went wrong and why it was wrong, current computer programs are not good at figuring out how to change the offending behavior. The approach being most rigorously explored right now involves telling the program to, in the future, rely more on those parts that tend to offer good advice. This does not create any new solutions to the lowest-level problems, but makes the best possible use of those techniques the program has. Fortunately, in a limited realm such as a game, there is a level at which the program doesn't need to find new ways to do things, and there are things it will always want to do. The rules of the game provide a finite list of possible ways to try to do anything (the legal moves), and the overriding goal is always to win the game. There would be no point in the program being able to override these limits. In larger contexts, such as life itself, it can be argued that there are no such limits; there are always new ways to do something, and we haven't found any places to stop asking, "Why?" A program that did not have upper and lower limits on these chains of reasoning would not merely be intelligent, it would be sentient. WHY IS THIS SO DIFFICULT? One thing I have gained from this project is an appreciation of the sheer immensity of the problem of computer Go. Playing the game has provided careers for generations of professionals; the best programs, which are over 80,000 lines long and encompass decades of person-years of work, play only as moderate amateurs. Why, one might wonder, is Go so difficult to program? Why can't we just ask a master player how they choose their moves, and then use the computers calculating power better and faster? The biggest reason is that good players don't know why they choose the moves they choose. If asked, they will often respond, "It just looked right." I have found three particular obstacles, things which even begining human players do better than any extant programs: grouping, spotting salient details, and making intuitional judgements. By grouping, I mean recognizing groups of stones which are near each other and not cut apart by enemy stones. Even an amateur can perform this task instantaneously and with a fairly high degree of accuracy. This is difficult to program because of the vagueness of such concepts as "near". In Go, pairs of stones can be more distant than individual stones and still be in the same group, and enemy stones in the area have an effect which is hard to explicitly define. Even if a threshold distance is known, finding the distance between two stones would involve the Pythagorean theorem; this only takes a fraction of a second, but if it must be done several thousand times a turn it can be cripplingly slow. People are also very good at spotting things which are important or different. Although we ignore most of the cars we see on the road, we could spot a police car instantly. The most efficient way for a computer to spot it would be to check every car seen. It is not clear whether humans are doing this subconsciously, but if we are we are simultaneously doing that and thousands of other things (checking for odd smells, looking for anyone we know, listening to the radio, watching street signs . . . ) at amazing speed. Within a field, this ability to jump straight to the important part is what makes for expertise. A musician can immediately pick out the melody in a song, ignoring the other notes. An English teacher can spot spelling errors in a ten-page essay; if he is checking every word against some internal dictionary, he's doing it fast enough that he still has time to comprehend the essay and look for a thesis. A Go player needn't try to defend a group that isn't threatened, and doesn't consciously check each group every turn. Because we don't process (or process very quickly) most of the information that comes our way, we can make snap judgements, coming up with a good answer without having complete information. For example, if a person unwittingly bites into a jalapeo, they can instantly grab a glass of water and quaff it. From a programming standpoint, this involves going through the list of items on the table, comparing the quenching potential of water with that of other available beverages, considering whether the clear liquid is in fact hydrochloric acid, and so on. Whether the person is not doing this or is doing it very quickly is not clear. One type of snap judgement particularly useful in Go is the ability to spot the few good moves. After a few seconds, a player will have discarded all but two or three of the hundreds of legal moves. Again, they are not consciously going through each possible move; they are thinking, "I either want to do this, or that, or that." Here, perhaps, is an area where computers have the advantage: a person could not remember analyses of fourteen possible moves. In writing Fumiko, I spent most of my time on the sequencer, the part which would read out possible sequences of plays. At the Congress, Dave Fotland pointed out that this was not a good idea. Reading out sequences is very time-consuming, and time spent considering consequences of dumb moves is essentially wasted. Most of the better programs concentrate on selecting moves to consider. If a program can spot a half-dozen good moves intuitionally, even playing one of those at random is not likely to mess up the game horribly. When professional Go players take on several opponents at once, they may be doing something akin to this. While they are much faster, snap judgements are not as accurate as complete analysis, and can lead to closed-mindedness. If the best answer is unorthodox, it may never be considered, may be thrown out with the bad answers. Often, as in the jalapeo example, we don't have time to perform a complete analysis; other times, we don't have enough information on the topic (as in judging a person), and we really don't have time to fully analyze all of the information we get, which is why we must ignore so much of it. An intelligence must walk a fine line; too much analysis takes too long, and too little may miss the best answers--especially if our intuitions are flawed. FUTURE WORK As mentioned in the introduction, this will not be the end of my work in computer Go. Work on "HellGo" has commenced, and it will feature several improvements over Fumiko: First, and perhaps most importantly, the data structures will be smaller, faster, and more efficient. This is made possible by the programming experience I have gained working on Fumiko, and on the advantage of having a collaborator. To give just one example, the structure used to store the locations of the stones on the board is nearly six times smaller than it used to be, and appears to be a little faster. We will not ignore the grouping question, as I did with Fumiko. Already, HellGo has a loose concept of groups, and we will be improving this, writing sections to detect whether stones or chains of stones are connected. More emphasis will be put on the move selector, the part which decides which moves should be considered in reading out sequences. These parts in Fumiko were rather sloppily thrown together as the Congress approached, and she relied on the sequence reader--a bad idea for reasons explained earlier. We do hope to get HellGo to learn, or at least be teachable. Currently, I envision a program where, if it made a bad move, the human opponent could ask it why it did so, and then point out the weak link in a displayed chain of reasoning. Finally, we will be spending more time on this program than I was able to spend on Fumiko; it may be several years before it is completed. We intend to seek further funds from other sources to continue our work. In any case, I for one will be doing this in my spare time for the forseeable future.